Fragen zu Beantworten:
1. Wer sind unsere Kunden?
2. In welchen Ländern ist unser Service beliebt
3. Was bringt uns das meiste Geld?
4. Welche Bereiche erfordern Investitionen?
5. Welche Projekte bringen uns den größten Gewinn?
6. Ist unser Projekt erfolgreich? Können unsere Kunden Investoren finden?
import pandas as pd
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
# # Wichtig, wenn das Notebook exportiert wird in HTML, dann werden die Grafiken auch im HTML angezeigt
# from plotly.offline import init_notebook_mode, iplot
# init_notebook_mode(connected=True)
df = pd.read_pickle("df_after_preprocessing.pkl")
df.head(2)
| funded_amount | loan_amount | activity | sector | use | country_code | country | currency | term_in_months | lender_count | ... | difference_usd | difference_pct | success_type | Continent_Name | Three_Letter_Country_Code | gdp | percPoverty | pop | loan_cat | project_size | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 300.0 | 300.0 | Fruits & Vegetables | Food | To buy seasonal, fresh fruits to sell. | PK | Pakistan | PKR | 12.0 | 12 | ... | 0.0 | 100.0 | successful | Asia | PAK | 1188.859985 | 24.299999 | 220892336.0 | 250-500 USD | medium |
| 1 | 575.0 | 575.0 | Rickshaw | Transportation | to repair and maintain the auto rickshaw used ... | PK | Pakistan | PKR | 11.0 | 14 | ... | 0.0 | 100.0 | successful | Asia | PAK | 1188.859985 | 24.299999 | 220892336.0 | 500-1,000 USD | large |
2 rows × 26 columns
for_visual = df.groupby(['loan_cat'], as_index=False).size()
fig = px.pie(
for_visual,
values='size',
names='loan_cat',
color='loan_cat',
color_discrete_map={
'< 250 USD':'#98c1d9' ,
'250-500 USD':'#3d5a80',
'500-1,000 USD':'#A9F4F6',
'1,000-2,000 USD':'#ee6c4d',
'2,000-5,000 USD':'#7F3D49',
'>5,000 USD': '#293241'
},
title='Wie viel Geld brauchen unsere Kunden für ihre Projekte?'
)
fig.update_traces(textposition='auto', textinfo='percent + label')
fig.update_layout(legend_title_text='Kreditbetrag')
fig.show()
--- Der größte Teil (ca. 78 %) entfällt auf Kredite bis zu 1000 USD.
--- Ein Viertel der Projekte liegt unter 250 USD.
--- Kredite über 1000 USD machen 22 % aus, einschließlich der Kredite über 5000 EUR, die etwa 1,5 % der Gesamtzahl der Kredite ausmachen.
---> Grundsätzlich benötigen unsere Klienten einen relativ geringen Geldbetrag
df["is_goal_achieved"] = df.loc[:, 'difference_usd'].apply(lambda x: "yes" if x>=0 else "no")
for_visual = df.groupby(by='is_goal_achieved', as_index=False).size()
for_visual
| is_goal_achieved | size | |
|---|---|---|
| 0 | no | 48328 |
| 1 | yes | 622877 |
fig = px.bar(data_frame=for_visual,
x='is_goal_achieved',
y='size',
text = 'size',
title="Wie viele Projekte haben den vollen Kreditbetrag gesammelt?",
labels={'size':'Anzahl Projekte', 'is_goal_achieved':'Ist Ziel erreicht?'},
color_discrete_sequence=["#98c1d9", '#ee6c4d'])
fig.update_traces(texttemplate='%{text:,}', textposition='outside')
fig.update_traces(showlegend=False)
fig.show()
--- 622,877 Projekte haben es geschafft, den gesamten Betrag zu sammeln.
--- 48,328 Projekte konnten nicht den vollen Betrag sammeln. Aber das bedeutet nicht, dass sie kein Geld bekommen haben. Weiter schauen wir diese Projekte präzise an.
for_visual = df.groupby('success_type', as_index=False).size().sort_values('size')
for_visual
| success_type | size | |
|---|---|---|
| 0 | fail | 3541 |
| 1 | partially | 44787 |
| 2 | successful | 622877 |
fig = px.pie(
for_visual,
values='size',
names='success_type',
color='success_type',
title='Ist unsere Plattform erfolgreich? Können unsere Kunden Kreditgeber finden?',
color_discrete_map={
'successful':'#3d5a80',
'partially':'#ee6c4d',
'fail':'#293241'
}
)
fig.update_traces(hole=.4)
fig.update_layout(
# Add annotations in the center of the donut pies.
annotations=[dict(text='Erfolg?', x=0.5, y=0.5, font_size=20, showarrow=False)])
fig.show()
--- Fast 93% der Projekte auf unserer Plattform haben den erforderlichen Geldbetrag erhaltet.
--- 6,7 Prozent der Projekte haben den Betrag teilweise aufgebracht.
--- Nur 0,5 Prozent der Projekte konnten kein Geld aufbringen.
---> Wir können behaupten, dass unsere Plattform in fast 100% der Fälle Menschen hilft, Geld für ihre Projekte zu finden.
unsuccessful_projects = df.loc[df.loc[:,'is_goal_achieved'] == 'no'].copy()
unsuccessful_projects.head(2)
| funded_amount | loan_amount | activity | sector | use | country_code | country | currency | term_in_months | lender_count | ... | difference_pct | success_type | Continent_Name | Three_Letter_Country_Code | gdp | percPoverty | pop | loan_cat | project_size | is_goal_achieved | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 87 | 4275.0 | 5000.0 | Personal Housing Expenses | Housing | to pave the ground and repair the ceiling, to ... | PS | Palestine | USD | 39.0 | 58 | ... | 85.500000 | partially | Asia | PSE | 3239.729980 | 29.200001 | 4803269.0 | 2,000-5,000 USD | extrem large | no |
| 112 | 1925.0 | 2400.0 | Electronics Repair | Services | to pay the annual rent for his shop | IQ | Iraq | USD | 15.0 | 41 | ... | 80.209999 | partially | Asia | IRQ | 4145.859863 | 18.900000 | 40222504.0 | 2,000-5,000 USD | very large | no |
2 rows × 27 columns
liste_category = ['0%','< 25%','25-50%','50-75%', "> 75%"]
unsuccessful_projects["success_group"] = pd.cut(
unsuccessful_projects.loc[:, "difference_pct"],
bins = [-1, 0, 25, 50, 75, 100],
labels = liste_category)
unsuccessful_projects.head(2)
| funded_amount | loan_amount | activity | sector | use | country_code | country | currency | term_in_months | lender_count | ... | success_type | Continent_Name | Three_Letter_Country_Code | gdp | percPoverty | pop | loan_cat | project_size | is_goal_achieved | success_group | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 87 | 4275.0 | 5000.0 | Personal Housing Expenses | Housing | to pave the ground and repair the ceiling, to ... | PS | Palestine | USD | 39.0 | 58 | ... | partially | Asia | PSE | 3239.729980 | 29.200001 | 4803269.0 | 2,000-5,000 USD | extrem large | no | > 75% |
| 112 | 1925.0 | 2400.0 | Electronics Repair | Services | to pay the annual rent for his shop | IQ | Iraq | USD | 15.0 | 41 | ... | partially | Asia | IRQ | 4145.859863 | 18.900000 | 40222504.0 | 2,000-5,000 USD | very large | no | > 75% |
2 rows × 28 columns
unsuccessful_projects_gr = unsuccessful_projects.groupby(by=['success_group'], as_index=False).size().sort_values('success_group')
unsuccessful_projects_gr
| success_group | size | |
|---|---|---|
| 0 | 0% | 3383 |
| 1 | < 25% | 9822 |
| 2 | 25-50% | 13515 |
| 3 | 50-75% | 14214 |
| 4 | > 75% | 7394 |
fig = px.pie(
unsuccessful_projects_gr,
values='size',
names='success_group',
color='success_group',
title='Erfolglose Projekte. Prozentsatz der gesammelten Gelder',
color_discrete_map={
'0%':'#7F3D49',
'< 25%':'#A9F4F6 ',
'25-50%':'#98c1d9',
'50-75%':'#3d5a80',
'> 75%':'#ee6c4d'
}
)
fig.update_traces(textposition='inside', textinfo='percent + label')
fig.update_layout(legend_title_text='Prozent der<br>gesammelten Betrag')
fig.show()
unsuccessful_projects_gr.loc[unsuccessful_projects_gr.loc[:, 'success_group'] == '0%', 'size']
0 3383 Name: size, dtype: int64
f"{round(3383/ df.shape[0] * 100, 2)} %"
'0.5 %'
--> Nur sieben Prozent der Projekte konnten nichts sammeln und im Vergleich zum gesamten Datensatz nur 0,5 Prozent.
unsucces_vs_description = unsuccessful_projects.groupby(
['success_group', 'has_description'],
as_index=False,
observed=True).agg(number_of_projects=('has_description', 'size'))
unsucces_vs_description = unsucces_vs_description.merge(unsuccessful_projects_gr, how='left', left_on='success_group', right_on='success_group')
unsucces_vs_description['part_prc'] = round(unsucces_vs_description['number_of_projects'] / unsucces_vs_description['size'] * 100, 2)
unsucces_vs_description
| success_group | has_description | number_of_projects | size | part_prc | |
|---|---|---|---|---|---|
| 0 | 0% | no | 131 | 3383 | 3.87 |
| 1 | 0% | yes | 3252 | 3383 | 96.13 |
| 2 | < 25% | no | 355 | 9822 | 3.61 |
| 3 | < 25% | yes | 9467 | 9822 | 96.39 |
| 4 | 25-50% | no | 107 | 13515 | 0.79 |
| 5 | 25-50% | yes | 13408 | 13515 | 99.21 |
| 6 | 50-75% | no | 74 | 14214 | 0.52 |
| 7 | 50-75% | yes | 14140 | 14214 | 99.48 |
| 8 | > 75% | no | 21 | 7394 | 0.28 |
| 9 | > 75% | yes | 7373 | 7394 | 99.72 |
fig = px.bar(
unsucces_vs_description,
x='success_group',
y='part_prc',
text='part_prc',
color='has_description',
barmode='group',
color_discrete_sequence=["#ee6c4d", "#3d5a80"],
title='Verteilung der Projekte mit und ohne Beschreibungen nach dem Prozentsatz der gesammelten Gelder',
labels={
'part_prc':'Projekte %',
'success_group': 'Prozentualer Anteil der erhaltenen Finanzierung'
})
fig.update_traces(texttemplate='%{text:.2f}%', textposition='outside')
# Set legend title text
fig.update_layout(
legend_title_text='Enthält das Projekt<br>eine Beschreibung?')
fig.show()
| Förderkategorie | ohne Beschreibung |
|---|---|
| 0% | 3.87% |
| < 25% | 3.61% |
| 25-50% | 0.79% |
| 50-75% | 0.52% |
| > 75% | 0.28% |
--- Es besteht ein deutlicher Zusammenhang zwischen der Verfügbarkeit der Beschreibung und dem Prozentsatz der eingenommenen Gelder.
Gleichzeitig mit dem Rückgang des Prozentsatzes der Projekte ohne Beschreibung steigt der Prozentsatz des gesammelten Betrags.
---> Wir müssen unsere Website dahingehend verbessern, dass sie eine detaillierte Beschreibung der einzelnen Projekte enthält, da dies Auswirkungen auf den Erfolg haben kann.
successful_projects = df.loc[df.loc[:,'is_goal_achieved'] == 'yes'].copy()
for_visual = successful_projects.groupby(['loan_cat'], as_index=False,
observed=True).agg(number_of_projects=('loan_amount', 'size'), mean_lender=('lender_count', 'mean'))
for_visual
| loan_cat | number_of_projects | mean_lender | |
|---|---|---|---|
| 0 | < 250 USD | 162290 | 5.666098 |
| 1 | 250-500 USD | 188030 | 11.281460 |
| 2 | 500-1,000 USD | 145518 | 21.388296 |
| 3 | 1,000-2,000 USD | 81414 | 39.170720 |
| 4 | 2,000-5,000 USD | 37343 | 71.863027 |
| 5 | >5,000 USD | 8282 | 129.559044 |
fig = px.scatter(data_frame=for_visual,
y='loan_cat',x='mean_lender',
text='mean_lender',
size='number_of_projects',
size_max=45,
# size=[10]*6,
opacity=1,
title="Wie viele Kreditgeber müssen unsere Kunden überzeugen, um eine vollständige Finanzierung zu erhalten?",
labels={
'loan_cat':'Kreditkategorie',
'mean_lender':'mittlere Anzahl der Kreditgeber'
},
color_discrete_sequence=["#98c1d9"])
fig.update_traces(texttemplate='%{text:.0f}', textposition='middle right')
fig2 = px.line(data_frame=for_visual, y='loan_cat',x='mean_lender', markers=True)
fig2.update_traces(line=dict(color='#3d5a80', width=1.5))
fig.add_trace(fig2.data[0])
fig.update_traces(showlegend=False)
Der Großteil der Kredite liegt bei bis zu 1.000 Euro. Daher muss unser Kunde durchschnittlich 12 Kreditgeber finden.
Und für Kleinkredite bis zu 250 Euro werden im Durchschnitt 6 Kreditgeber benötigt.
--> Wir müssen damit werben, dass unsere Kunden nur eine sehr kleine Anzahl von Personen überzeugen müssen, damit ihr Projekt vollständig finanziert wird.
cont_projekt_num_df = df.groupby('Continent_Name', as_index=False).agg(projekt_num = ('loan_amount', 'size'))
unsuc_projekt_cont_df = unsuccessful_projects.groupby(['success_group', 'Continent_Name'], as_index=False).size()
unsuc_projekt_cont_df = unsuc_projekt_cont_df.merge(
cont_projekt_num_df, how='left', left_on='Continent_Name', right_on='Continent_Name')
unsuc_projekt_cont_df['unsuc_prc'] = round(unsuc_projekt_cont_df['size'] / unsuc_projekt_cont_df['projekt_num'] * 100, 2)
unsuc_projekt_cont_df.head()
| success_group | Continent_Name | size | projekt_num | unsuc_prc | |
|---|---|---|---|---|---|
| 0 | 0% | Africa | 1153 | 173038 | 0.67 |
| 1 | 0% | Asia | 773 | 308449 | 0.25 |
| 2 | 0% | Europe | 41 | 19352 | 0.21 |
| 3 | 0% | North America | 1167 | 83436 | 1.40 |
| 4 | 0% | Oceania | 27 | 7955 | 0.34 |
# fig = px.line(unsuc_projekt_cont_df, x="success_group", y='unsuc_prc', markers=True,
# # marginal="box",
# color='Continent_Name',
# title="Verteilung des Prozentsatzes des gesammelten Geldes aus dem angegebenen Betrag",
# labels={'difference_pct':'Prozentsatz des gesammelten Geldes aus dem erforderlichen Betrag, %',
# 'Continent_Name':'Kontinent',
# 'success_group': 'Prozentualer Anteil der erhaltenen Finanzierung',
# 'unsuc_prc': '% erfolgsosen Projekte' },
# color_discrete_map={
# 'Asia':'#3d5a80' ,
# 'Africa':'#98c1d9',
# 'North America':'#0B696C',
# 'South America':'#ee6c4d',
# 'Europe':'#7F3D49',
# 'Oceania': '#293241'
# }
# )
# fig.show()
cont_projekt_stat = unsuc_projekt_cont_df.groupby(['Continent_Name', 'projekt_num'], as_index=False).agg(
unsuc_prc = ('unsuc_prc', 'sum'), unsuc_num=('size', 'sum')).sort_values('unsuc_prc')
cont_projekt_stat['projekt_prc'] = round(cont_projekt_stat['projekt_num'] / sum(cont_projekt_stat['projekt_num']) * 100, 2)
cont_projekt_stat['suc_prc'] = 100 - cont_projekt_stat['unsuc_prc']
cont_projekt_stat['suc_num'] = cont_projekt_stat['projekt_num'] - cont_projekt_stat['unsuc_num']
cont_projekt_stat.sort_values('projekt_num', ascending=False, inplace=True)
fig = make_subplots(rows=1, cols=2,
specs=[[{"type": "bar"}, {"type": "bar"}]],
subplot_titles=("Verhältnis zwischen erfolgreichen<br>und erfolglosen Projekten pro Kontinent", "Anzahl Projekte nach Kontinent"))
fig1 = px.bar(data_frame=cont_projekt_stat, x='Continent_Name',y=['unsuc_prc', 'suc_prc'],
# text='suc_prc',
color_discrete_map={'unsuc_prc':'#F3927A', 'suc_prc':'#506280'},
# labels={'unsuc_prc' :'erfolglos', 'suc_prc' : 'erfolgreich'}
)
fig1.data[0]['text'] = round(cont_projekt_stat['unsuc_prc'],2)
fig1.data[1]['text'] = round(cont_projekt_stat['suc_prc'],2)
fig1.update_traces(texttemplate='%{text:,}%', textposition='inside', textfont_size=8)
# fig2 = px.pie(cont_projekt_stat, values='projekt_num', names='Continent_Name', color='Continent_Name',
# color_discrete_map={
# 'Asia':'#3d5a80' ,
# 'Africa':'#98c1d9',
# 'North America':'#A9F4F6 ',
# 'South America':'#ee6c4d',
# 'Europe':'#7F3D49',
# 'Oceania': '#293241'
# })
fig2 = px.bar(data_frame=cont_projekt_stat, x='Continent_Name',y=['unsuc_num', 'suc_num'],
color_discrete_map={'unsuc_num':'#F3927A', 'suc_num':'#506280'},
)
fig2.data[1]['text'] = cont_projekt_stat['projekt_num']
fig2.update_traces(textposition='outside', textfont_size=8)
fig.add_trace(fig1.data[0], row=1, col=1)
fig.add_trace(fig1.data[1], row=1, col=1)
fig.add_trace(fig2.data[0], row=1, col=2)
fig.add_trace(fig2.data[1], row=1, col=2)
fig.update_traces(showlegend=False, row=1,col=1)
fig.update_layout(barmode='relative')
fig.update_yaxes(title_text = "%", row=1,col=1)
fig.update_yaxes(title_text = "Anzahl Projekte", row=1,col=2)
fig.show()
# Create figure with secondary y-axis
fig = make_subplots(specs=[[{"secondary_y": True}]])
# Add traces
fig.add_trace(
go.Bar(x=cont_projekt_stat['Continent_Name'], y=cont_projekt_stat['unsuc_num'], marker_color='#F3927A',
name='Erfolgreiche Projekte Anz.'),
secondary_y=False,
)
fig.add_trace(
go.Bar(x=cont_projekt_stat['Continent_Name'], y=cont_projekt_stat['suc_num'], marker_color='#506280',
name='Erfolglose Projekte Anz.'),
secondary_y=False,
)
fig.add_trace(
go.Scatter(x=cont_projekt_stat['Continent_Name'], y=cont_projekt_stat['unsuc_prc'], name='Erfolglose Projekte %',
line=dict(color='#293241', width=2)),
secondary_y=True,
)
# Add figure title
fig.update_layout(
title_text="Verhältnis zwischen erfolgreichen und erfolglosen Projekten pro Kontinent"
)
# Set x-axis title
fig.update_xaxes(title_text="Kontinent")
# Set y-axes titles
fig.update_yaxes(title_text="Anzahl Projekte", secondary_y=False)
fig.update_yaxes(title_text="% erfolglose Projekte", secondary_y=True, range=[0, 100])
fig.update_layout(barmode='relative')
fig.show()
--- Der Prozentsatz der erfolgreich finanzierten Projekte ist auf den Kontinenten höher, auf denen der Service besser entwickelt ist. In Asien und Afrika sind es zum Beispiel 95,5 % bzw. 93 %.
| Kontinent | Anz. Projekte | Erfolglose Projekte, % |
|---|---|---|
| Asia | 308,449 | 4.42% |
| Africa | 173,038 | 7.05% |
| North America | 83,436 | 14.62% |
| South America | 78,975 | 8.6% |
| Europe | 19,352 | 14.53% |
| Oceania | 7,955 | 9.25% |
# country_suc_unsuc_df = df.groupby(['country'], as_index=False, observed=True).agg(
# projekt_num = ('loan_amount', 'size')).sort_values('projekt_num', ascending=False)
# country_suc_unsuc_df = country_suc_unsuc_df.merge(
# unsuccessful_projects.groupby(['country'], as_index=False).agg(unsuc_projekt_num = ('loan_amount', 'size')),
# how='left',
# left_on='country', right_on='country')
# country_suc_unsuc_df = country_suc_unsuc_df.merge(
# successful_projects.groupby(['country'], as_index=False).agg(suc_projekt_num = ('loan_amount', 'size')),
# how='left',
# left_on='country', right_on='country')
# country_suc_unsuc_df['unsuc_projekt_prc'] = round(
# country_suc_unsuc_df['unsuc_projekt_num'] / country_suc_unsuc_df['projekt_num'] *100, 2)
# country_suc_unsuc_df_top10 = country_suc_unsuc_df.loc[:10, :].copy()
# country_suc_unsuc_df_top10
for_visual = df.groupby('sector', as_index=False).agg(funded_sum = ('funded_amount', 'sum'),
count = ('sector','size')).sort_values('count', ascending=False)
fig = make_subplots(rows=1, cols=2, specs=[[{"type": "pie"}, {"type": "pie"}]],
subplot_titles=("Gesamtanzahl der Projekte", "Gesamtkreditbetrag"))
fig1 = px.pie(
for_visual, values='count', names='sector')
fig2 = px.pie(
for_visual, values='funded_sum', names='sector')
fig.add_trace(fig1.data[0], row=1, col=1)
fig.add_trace(fig2.data[0], row=1, col=2)
fig.update_layout(title="Daten nach Sektoren",template='ggplot2')
fig.update_layout(legend=dict(
xanchor="right",
x = 0
))
fig.show()
--- TOP 3 Sectors: 'Agriculture', 'Food', 'Retail'
--- Für einen genaueren Vergleich sollten Sie eine andere Form der Grafik verwenden
df_sector = for_visual.copy()
df_sector.head()
| sector | funded_sum | count | |
|---|---|---|---|
| 0 | Agriculture | 133770632.0 | 180302 |
| 6 | Food | 115089208.0 | 136657 |
| 11 | Retail | 90864856.0 | 124494 |
| 12 | Services | 42966084.0 | 45140 |
| 10 | Personal Use | 14189775.0 | 36385 |
sum_funded = df_sector.loc[:, 'funded_sum'].sum()
sum_count = df_sector.loc[:, 'count'].sum()
print(sum_funded, " ", sum_count)
527563800.0 671205
df_sector['funded_%'] = round(df_sector.loc[:, 'funded_sum'] / sum_funded * 100, 2)
df_sector['count_%'] = round(df_sector.loc[:, 'count'] / sum_count * 100, 2)
df_sector.sort_values('count', ascending=False)
df_sector.head()
| sector | funded_sum | count | funded_% | count_% | |
|---|---|---|---|---|---|
| 0 | Agriculture | 133770632.0 | 180302 | 25.360001 | 26.86 |
| 6 | Food | 115089208.0 | 136657 | 21.820000 | 20.36 |
| 11 | Retail | 90864856.0 | 124494 | 17.219999 | 18.55 |
| 12 | Services | 42966084.0 | 45140 | 8.140000 | 6.73 |
| 10 | Personal Use | 14189775.0 | 36385 | 2.690000 | 5.42 |
fig = go.Figure(data=[
go.Bar(
name='Kreditbetrag, %',
y=df_sector['sector'],
x=df_sector['funded_%'],
marker_color='#F3927A',
text = df_sector['funded_%'], orientation='h'),
go.Bar(
name='Anzahl Projekte, %', marker_color='#506280',
y=df_sector['sector'], x=df_sector['count_%'], text = df_sector['count_%'], orientation='h')
])
# Change the bar mode
fig.update_traces(texttemplate='%{text:.1f}%', textposition='outside') #:.2f
fig.update_layout(barmode='group', title="Gesamtzahl der Projekte und Gesamtkreditbetrag nach Sektoren",
height=800) #, width=1000)#,
fig.update_yaxes(categoryorder = "total ascending", title_text = "Sektor")
fig.update_xaxes(range=[0, 29], title_text = "Anteil an der Gesamtzahl, %")
fig.update_layout(legend=dict(
yanchor="bottom",
y=0.01,
xanchor="right",
x = 0.99
))
fig.show()
--- TOP3 Sektors: 'Agriculture', 'Food', 'Retail'
--- Die Anzahl der Projekte und die Höhe des gesammelten Geldes sind für jeden Sektor proportional gleich
--- Deutlicher prozentualer Unterschied (2,7%) nur für den Sektor 'Personal Use'
#TOP5
top5_country = list(df.groupby('country', as_index=False)
.size()
.sort_values('size', ascending=False)
.reset_index().loc[:4, 'country'])
top5_country
['Philippines', 'Kenya', 'El Salvador', 'Cambodia', 'Pakistan']
#TOP5
top5_country = df.groupby('country', as_index=False).size().sort_values('size', ascending=False).reset_index(drop=True).loc[:4, :]
top5_country_l = list(top5_country.loc[:4, 'country'])
top5_country_l
['Philippines', 'Kenya', 'El Salvador', 'Cambodia', 'Pakistan']
top5_country_df = df.loc[df['country'].isin(top5_country_l), :].copy()
top5_country_gr = top5_country_df.groupby(
['country', 'sector'], as_index=False, observed=True
).size().sort_values(['country', 'size'], ascending=False)
top5_country_gr = top5_country_gr.merge(top5_country, left_on='country', right_on='country', how='left')
top5_country_gr['size_prc'] = round(top5_country_gr['size_x'] / top5_country_gr['size_y'] * 100, 2)
top5_country_gr
| country | sector | size_x | size_y | size_prc | |
|---|---|---|---|---|---|
| 0 | Philippines | Retail | 53581 | 160441 | 33.40 |
| 1 | Philippines | Food | 42700 | 160441 | 26.61 |
| 2 | Philippines | Agriculture | 36776 | 160441 | 22.92 |
| 3 | Philippines | Housing | 6785 | 160441 | 4.23 |
| 4 | Philippines | Transportation | 4568 | 160441 | 2.85 |
| ... | ... | ... | ... | ... | ... |
| 70 | Cambodia | Health | 46 | 34836 | 0.13 |
| 71 | Cambodia | Manufacturing | 42 | 34836 | 0.12 |
| 72 | Cambodia | Clothing | 41 | 34836 | 0.12 |
| 73 | Cambodia | Wholesale | 7 | 34836 | 0.02 |
| 74 | Cambodia | Entertainment | 7 | 34836 | 0.02 |
75 rows × 5 columns
for_visual = top5_country_gr.groupby('country').head(3).sort_values(['country', 'size_x'], ascending=False).reset_index(drop=True)
fig = px.bar(for_visual, x='size_x', y='country', title="TOP 3 Sektoren für TOP 5 Länder",
color = 'sector', orientation='h', text='size_x',
hover_name="sector", hover_data=["country", "size_x", 'size_prc'],
color_discrete_map={
'Retail':'#98c1d9' ,
'Food':'#3d5a80',
'Agriculture':'#ee6c4d',
'Services':'#293241',
'Transportation':'#109DA2',
'Housing': '#7F3D49',
'Personal Use':'#A9F4F6'
},
labels={'country' : 'Land', 'sector' : 'Sektor', 'size_x' : 'Anzahl Projekte'})
fig.update_traces(texttemplate='%{text}', textposition='inside')
fig.update_yaxes(categoryorder = "total descending")
fig.show()
print(df.shape[0] * 0.02)
print(df.shape[0] * 0.005)
13424.1 3356.025
for_visual = df.groupby(by='country', as_index=False).size().sort_values(by='size', ascending=False)
percent_other = 1
num = df.shape[0]
num_other = sum(for_visual.loc[for_visual.loc[:, 'size'] < num*percent_other/100, 'size'])
for_visual.drop(for_visual.loc[for_visual.loc[:, 'size'] < num*percent_other/100, :].index, axis=0, inplace=True)
for_visual['country_patrent'] = for_visual.loc[:, 'size'].apply(lambda x: '< 2%' if x < num*0.02 else '')
for_visual = for_visual.append({'country': 'other (< 1%)', 'size' : num_other, 'country_patrent' : '< 2%'}, ignore_index=True)
fig = px.sunburst(
for_visual,
path=['country_patrent', 'country'],
values='size',
title= 'Woher kommen unsere Kunden?',
template='ggplot2'
# color_discrete_sequence= px.colors.sequential.matter_r
)
fig.update_layout(margin = dict(t=25, l=0, r=0, b=0))
fig.update_traces(textinfo='label+percent entry', insidetextorientation='radial')
fig.show()
--> Etwa 24 Prozent der Spendenanfragen wurden auf den Philippinen gestellt
--> Als nächstes kommen Kenia (11%), El Salvador (6%), Kambodscha (5%) und Pakistan (4%)
for_visual = df.groupby(by=['Continent_Name', 'country'], as_index=False).size().sort_values(by='size', ascending=False)
fig = px.sunburst(
for_visual,
path=['Continent_Name', 'country'],
values='size',
title= 'Anzahl der Projekte nach Kontinenten und Ländern',
template='ggplot2'
# color_discrete_sequence= px.colors.sequential.matter
)
fig.show()
--> Die meisten unserer Kunden leben in Asien und Afrika.
for_visual = df.groupby(['gdp', 'country', 'Continent_Name'], as_index=False, observed=True).size()
for_visual['ratio'] = for_visual['size'] / for_visual['gdp']
for_visual.loc[for_visual['size'] > 22000, 'country_to_show'] = for_visual.loc[:, 'country']
fig =px.line(for_visual, x='gdp', y='size',
hover_name="country", text='country_to_show',
markers=True,
log_x=True,
color_discrete_sequence=["#98c1d9"],
title='Anzahl der Projekte VS Höhe des BIPs',
labels={'gdp' : 'log. Pro-Kopf-BIP, $', 'size':'Anzahl Projekte'})
fig.update_traces(marker=dict(size=4.5, color='#3d5a80', opacity=0.75))
fig.update_traces(textposition='top center')
fig.show()
--- Das Schaubild zeigt, dass der Großteil der Projekte auf Länder mit einem Pro-Kopf-BIP von weniger als 7.000 US-Dollar entfällt.
Philippines - 3,298.83 $
Kenya - 1,878.58 $
El Salvadoe - 3,798.64 $
for_visual = df.groupby(by=['Three_Letter_Country_Code', 'country'], as_index=False, observed=True).agg(funded_sum = ('funded_amount', 'sum'),
count = ('Three_Letter_Country_Code', 'size'))
for_visual.head()
| Three_Letter_Country_Code | country | funded_sum | count | |
|---|---|---|---|---|
| 0 | PAK | Pakistan | 12467100.0 | 26857 |
| 1 | IND | India | 6466850.0 | 11237 |
| 2 | KEN | Kenya | 32248404.0 | 75825 |
| 3 | NIC | Nicaragua | 9854375.0 | 11781 |
| 4 | SLV | El Salvador | 23357724.0 | 39875 |
fig = px.choropleth(for_visual, locations="Three_Letter_Country_Code",
color="count", # lifeExp is a column of gapminder
hover_name="country", # column to add to hover information
color_continuous_scale=px.colors.sequential.matter,
labels={'count':'Anzahl der Projekte'})
fig.update_layout(title="Anzahl der Projekte nach Ländern")
fig.show()
fig = px.choropleth(for_visual, locations="Three_Letter_Country_Code",
color="funded_sum", # lifeExp is a column of gapminder
hover_name="country", # column to add to hover information
color_continuous_scale=px.colors.sequential.matter,
labels={'funded_sum':'Investitionen, USD'}
)
fig.update_layout(title="Investitionen nach Ländern")
fig.show()
--> Wir haben Kunden auf der ganzen Welt (Asia, Afrika, Nord- und Südamerika).
--> Unsere Kunden sind hauptsächlich in Entwicklungsländern ansässig.
--> Die meisten Spendenanfragen gingen von Bewohnern der Philippinen, Kenias, El Salvadors, Kambodschas und Pakistans ein.
--> Auf den Philippinen wurden 160.000 Spendenanträge gestellt.
--> Die meisten Investitionen wurden auf den Philippinen, in Kenia, Peru, Paraguay, El Salvador und den USA gesammelt.
--> Unser Projekt auf den Philippinen gesammelt Investitionen in Höhe von 54 Millionen US-Dollar.
for_visual = df.groupby(['country', 'loan_cat', 'Three_Letter_Country_Code'],
as_index=False,
observed=True).agg(Number_of_Loans=('loan_cat', 'size')).sort_values(['loan_cat'])
for_visual
| country | loan_cat | Three_Letter_Country_Code | Number_of_Loans | |
|---|---|---|---|---|
| 0 | Pakistan | < 250 USD | PAK | 2368 |
| 207 | The Democratic Republic of the Congo | < 250 USD | COD | 90 |
| 197 | Kosovo | < 250 USD | XXK | 3 |
| 188 | Georgia | < 250 USD | GEO | 41 |
| 182 | Indonesia | < 250 USD | IDN | 797 |
| ... | ... | ... | ... | ... |
| 237 | Turkey | >5,000 USD | TUR | 4 |
| 229 | Zimbabwe | >5,000 USD | ZWE | 4 |
| 223 | Somalia | >5,000 USD | SOM | 20 |
| 17 | Kenya | >5,000 USD | KEN | 126 |
| 426 | Bhutan | >5,000 USD | BTN | 2 |
427 rows × 4 columns
fig = px.choropleth(for_visual,
locations="Three_Letter_Country_Code",
color="Number_of_Loans",
hover_name="country",
color_continuous_scale=px.colors.sequential.matter,
animation_frame='loan_cat',
animation_group='Three_Letter_Country_Code',
labels={'Number_of_Loans':'Anzahl der Kredite', 'loan_cat':'Kreditgröße'},
range_color=(0, 10000),
)
fig.update_layout(title="Geografische Verteilung von Krediten in verschiedenen Kreditkategorien")
fig.show()
for_visual = df.groupby(by='Continent_Name', as_index=False).agg(funded_sum = ('funded_amount', 'sum'),
size = ('project_size','size')).sort_values(by='size', ascending=False)
fig = make_subplots(rows=1, cols=2,
specs=[[{"type": "pie"}, {"type": "pie"}]],
subplot_titles=("Total number of projects", "Investments"))
fig1 = px.pie(for_visual, values='size', names='Continent_Name', color='Continent_Name',
color_discrete_map={
'Asia':'#3d5a80' ,
'Africa':'#98c1d9',
'North America':'#A9F4F6',
'South America':'#ee6c4d',
'Europe':'#7F3D49',
'Oceania': '#293241'
})
fig2 = px.pie(for_visual, values='funded_sum', names='Continent_Name', color='Continent_Name',
color_discrete_map={
'Asia':'#3d5a80' ,
'Africa':'#98c1d9',
'North America':'#A9F4F6',
'South America':'#ee6c4d',
'Europe':'#7F3D49',
'Oceania': '#293241'
})
fig.add_trace(fig1.data[0], row=1, col=1)
fig.add_trace(fig2.data[0], row=1, col=2)
fig.update_layout(legend=dict(
xanchor="right",
x = 0), title="Gesamtzahl der Projekte und Gesamtinvestitionen nach Kontinent")
fig.show()
--> 46% unserer Kunden aus asiatischen Ländern, 26% - aus Afrika, 12% - aus Nordamerikа und 12% - aus Südamerika
--> Wir haben fast keine Kunden aus Europa und Australien
--> Fast 72% der Kunden aus Asia und Afrika
--> Gleichzeitig wurden nur 57% der Investitionen in Asia und Afrika gesammelt.
--> Etwas mehr als ein Drittel der Investitionen wurde in Süd- und Nordamerika gesammelt
top5_country_df.columns
Index(['funded_amount', 'loan_amount', 'activity', 'sector', 'use',
'country_code', 'country', 'currency', 'term_in_months', 'lender_count',
'borrower_genders', 'repayment_interval', 'has_description',
'num_of_borrower', 'num_male', 'num_female', 'difference_usd',
'difference_pct', 'success_type', 'Continent_Name',
'Three_Letter_Country_Code', 'gdp', 'percPoverty', 'pop', 'loan_cat',
'project_size', 'is_goal_achieved'],
dtype='object')
top5_country_goal_df = top5_country_df.groupby(['country', 'is_goal_achieved'], as_index=False, observed=True).size()
top5_country_goal_df = top5_country_goal_df.loc[top5_country_goal_df['is_goal_achieved'] == 'no', ['country', 'size']]
top5_country_goal_df
| country | size | |
|---|---|---|
| 0 | Pakistan | 1937 |
| 2 | Kenya | 5446 |
| 4 | El Salvador | 7152 |
| 6 | Philippines | 2834 |
| 8 | Cambodia | 1372 |
df_top5_agg = top5_country_df.groupby(['country', 'pop', 'percPoverty'], as_index=False, observed=True).agg(
projekt_num = ('loan_amount', 'size'),
funded_amount_mean = ('funded_amount', 'mean'),
loan_amount_mean = ('loan_amount', 'mean'),
funded_amount_sum = ('funded_amount', 'sum')
)
df_top5_agg = df_top5_agg.merge(top5_country_goal_df, left_on='country', right_on='country')
df_top5_agg.rename(columns={"size": "unsuc_num"}, inplace=True)
df_top5_agg
| country | pop | percPoverty | projekt_num | funded_amount_mean | loan_amount_mean | funded_amount_sum | unsuc_num | |
|---|---|---|---|---|---|---|---|---|
| 0 | Pakistan | 220892336.0 | 24.299999 | 26857 | 464.203003 | 495.019928 | 12467100.0 | 1937 |
| 1 | Kenya | 53771300.0 | 36.099998 | 75825 | 425.300415 | 455.447418 | 32248404.0 | 5446 |
| 2 | El Salvador | 6486201.0 | 22.799999 | 39875 | 585.773682 | 667.176147 | 23357724.0 | 7152 |
| 3 | Philippines | 109581088.0 | 16.700001 | 160441 | 339.541473 | 344.938171 | 54476376.0 | 2834 |
| 4 | Cambodia | 16718971.0 | 17.700001 | 34836 | 540.162476 | 570.464478 | 18817100.0 | 1372 |
df_top5_agg['unsuc_prc'] = round(df_top5_agg['unsuc_num'] / df_top5_agg['projekt_num'] * 100, 2)
df_top5_agg['founded_pro_1tsd_person'] = round(df_top5_agg['funded_amount_sum'] / df_top5_agg['pop'] * 1000, 2)
df_top5_agg
| country | pop | percPoverty | projekt_num | funded_amount_mean | loan_amount_mean | funded_amount_sum | unsuc_num | unsuc_prc | founded_pro_1tsd_person | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Pakistan | 220892336.0 | 24.299999 | 26857 | 464.203003 | 495.019928 | 12467100.0 | 1937 | 7.21 | 56.44 |
| 1 | Kenya | 53771300.0 | 36.099998 | 75825 | 425.300415 | 455.447418 | 32248404.0 | 5446 | 7.18 | 599.73 |
| 2 | El Salvador | 6486201.0 | 22.799999 | 39875 | 585.773682 | 667.176147 | 23357724.0 | 7152 | 17.94 | 3601.14 |
| 3 | Philippines | 109581088.0 | 16.700001 | 160441 | 339.541473 | 344.938171 | 54476376.0 | 2834 | 1.77 | 497.13 |
| 4 | Cambodia | 16718971.0 | 17.700001 | 34836 | 540.162476 | 570.464478 | 18817100.0 | 1372 | 3.94 | 1125.49 |
for_norm_df = df_top5_agg.loc[:,
['percPoverty', 'projekt_num', 'funded_amount_mean', 'unsuc_prc', 'founded_pro_1tsd_person', 'pop']]
for_norm_df
| percPoverty | projekt_num | funded_amount_mean | unsuc_prc | founded_pro_1tsd_person | pop | |
|---|---|---|---|---|---|---|
| 0 | 24.299999 | 26857 | 464.203003 | 7.21 | 56.44 | 220892336.0 |
| 1 | 36.099998 | 75825 | 425.300415 | 7.18 | 599.73 | 53771300.0 |
| 2 | 22.799999 | 39875 | 585.773682 | 17.94 | 3601.14 | 6486201.0 |
| 3 | 16.700001 | 160441 | 339.541473 | 1.77 | 497.13 | 109581088.0 |
| 4 | 17.700001 | 34836 | 540.162476 | 3.94 | 1125.49 | 16718971.0 |
df_top5_agg_norm = (for_norm_df-for_norm_df.mean())/for_norm_df.std()
# df_top5_agg_norm = (for_norm_df-for_norm_df.min())/(for_norm_df.max()-for_norm_df.min())
df_top5_agg_norm = df_top5_agg_norm.merge(df_top5_agg.loc[:, 'country'], left_index=True, right_index=True)
df_top5_agg_norm
| percPoverty | projekt_num | funded_amount_mean | unsuc_prc | founded_pro_1tsd_person | pop | country | |
|---|---|---|---|---|---|---|---|
| 0 | 0.100746 | -0.737464 | -0.070279 | -0.064013 | -0.795114 | 1.588346 | Pakistan |
| 1 | 1.624857 | 0.149598 | -0.472749 | -0.068838 | -0.409263 | -0.315826 | Kenya |
| 2 | -0.092997 | -0.501641 | 1.187438 | 1.661766 | 1.722371 | -0.854591 | El Salvador |
| 3 | -0.880884 | 1.682429 | -1.359973 | -0.938965 | -0.482131 | 0.320069 | Philippines |
| 4 | -0.751722 | -0.592923 | 0.715564 | -0.589950 | -0.035863 | -0.737999 | Cambodia |
df_top5_agg_norm_t = df_top5_agg_norm.melt(id_vars=["country"],
var_name="Parameter",
value_name="Value")
df_top5_agg_norm_t.head(10)
| country | Parameter | Value | |
|---|---|---|---|
| 0 | Pakistan | percPoverty | 0.100746 |
| 1 | Kenya | percPoverty | 1.624857 |
| 2 | El Salvador | percPoverty | -0.092997 |
| 3 | Philippines | percPoverty | -0.880884 |
| 4 | Cambodia | percPoverty | -0.751722 |
| 5 | Pakistan | projekt_num | -0.737464 |
| 6 | Kenya | projekt_num | 0.149598 |
| 7 | El Salvador | projekt_num | -0.501641 |
| 8 | Philippines | projekt_num | 1.682429 |
| 9 | Cambodia | projekt_num | -0.592923 |
fig = px.line_polar(df_top5_agg_norm_t, r="Value", theta="country", color='Parameter', line_close=True,
color_discrete_map={
'percPoverty':'#3d5a80' ,
'projekt_num':'#98c1d9',
'funded_amount_mean':'#109DA2',
'unsuc_prc':'#ee6c4d',
'founded_pro_1tsd_person':'#7F3D49',
'pop': '#293241'
})
fig.show()
#по континентам: средний размер займа, объем собранных денег, количество проектов,
--> Man kann sehen, dass die Linien des gesammelten und des angeforderten Geldes nahe beieinander liegen und stellenweise zusammenfallen. Wie bereits erwähnt, bestätigt dies den Erfolg unseres Service. Die meisten Projekte haben ihr Ziel erreicht.
--> Im Durchschnitt befinden sich die größten Projekte in Europa und Amerika. In Asien und Afrika Projekte meist für kleine Beträge in US-Dollar.
-->
df.groupby(by='project_size', as_index=True).size()
project_size without investments 3383 small 196004 medium 179202 large 157531 very large 81873 extrem large 53212 dtype: int64
for_visual = df.groupby(by='project_size', as_index=False).agg(funded_sum = ('funded_amount', 'sum'),
count = ('project_size','size'))
sum_funded = for_visual.loc[:, 'funded_sum'].sum()
sum_count = for_visual.loc[:, 'count'].sum()
print(sum_funded, " ", sum_count)
527563800.0 671205
for_visual['Investitionen, %'] = round(for_visual.loc[:, 'funded_sum'] / sum_funded * 100, 2)
for_visual['Anzahl der Projekte, %'] = round(for_visual.loc[:, 'count'] / sum_count * 100, 2)
fig = px.bar(data_frame=for_visual,x='project_size',y='funded_sum', text = 'count',
color_discrete_sequence=["#98c1d9"],
title="Wie viel Geld wurde gesammelt in der Abhängigkeit von der Größe des Projekts?",
labels={'project_size':'Projektgröße', 'funded_sum':'Gesamtinvestition in US-Dollar'})
fig.update_traces(texttemplate='%{text} Prj.', textposition='outside')
fig.update_traces(showlegend=False)
fig.show()
# for_visual.head()
df_test = for_visual.loc[:, ["project_size", "Anzahl der Projekte, %", 'Investitionen, %']].melt(id_vars=["project_size"],
var_name="Parameter",
value_name="Value")
df_test
| project_size | Parameter | Value | |
|---|---|---|---|
| 0 | without investments | Anzahl der Projekte, % | 0.500000 |
| 1 | small | Anzahl der Projekte, % | 29.200000 |
| 2 | medium | Anzahl der Projekte, % | 26.700000 |
| 3 | large | Anzahl der Projekte, % | 23.470000 |
| 4 | very large | Anzahl der Projekte, % | 12.200000 |
| 5 | extrem large | Anzahl der Projekte, % | 7.930000 |
| 6 | without investments | Investitionen, % | 0.000000 |
| 7 | small | Investitionen, % | 6.940000 |
| 8 | medium | Investitionen, % | 13.350000 |
| 9 | large | Investitionen, % | 21.969999 |
| 10 | very large | Investitionen, % | 20.840000 |
| 11 | extrem large | Investitionen, % | 36.900002 |
df_test.replace({'project_size': 'small'}, 'small<br>bis 275 USD', inplace=True)
df_test.replace({'project_size': 'medium'}, 'medium<br>275 - 500 USD', inplace=True)
df_test.replace({'project_size': 'large'}, 'large<br>500 - 1,000 USD', inplace=True)
df_test.replace({'project_size': 'very large'}, 'very large<br>1,000 - 1,950 USD', inplace=True)
df_test.replace({'project_size': 'extrem large'}, 'extrem large<br>1,950 - 10,000 USD', inplace=True)
# fig = go.Figure(data=[
# go.Bar(name='Investitionen, %', x=for_visual['project_size'], y=for_visual['funded_%'], text = for_visual['funded_%']),
# go.Bar(name='Projekte Gesamtzahl, %', x=for_visual['project_size'], y=for_visual['count_%'], text = for_visual['count_%'])
# ])
fig = px.bar(df_test, x="project_size", y="Value",
color='Parameter',
color_discrete_sequence=["#FFD166", "#118AB2"],
text='Value',
category_orders={"Parameter": ['Anzahl der Projekte, %', 'Investitionen, %']},
labels={ 'Value':'Value, %', 'project_size':'Projektgröße'},
)
# Change the bar mode
fig.update_traces(texttemplate='%{text:.1f}%', textposition='outside') #:.2f
fig.update_layout(barmode='group', title="Gesamtinvestitionen und Gesamtzahl der Projekte nach Projektgröße") #, width=1000)#,
# fig.update_xaxes(categoryorder = "total ascending", title_text = "project_size")
fig.update_layout(legend=dict(
yanchor="top",
y=0.98,
xanchor="left",
x = 0.45
))
# 'count_%':'Anzahl der Projekte, %', 'funded_%': 'Investitionen, %',
fig.add_annotation( # add a text callout with arrow
text="max. Einkommen!", x="extrem large<br>1,950 - 10,000 USD", y=38, arrowhead=5, showarrow=True
)
fig.add_annotation( # add a text callout with arrow
text="max. Anzahl von Projekten!", x="small<br>bis 275 USD", y=31, arrowhead=5, showarrow=True
)
fig.show()
# [-1, 0, 275, 500, 1000, 1949, 100000]
--> 37% der Investitionen wurden für 8% sehr großer Projekte gesammelt.
--> Umgekehrt sind 30% kleine Projekte, für die nur 7% der Gesamtinvestition gesammelt wurden.
--> Große Investitionsprojekte können nicht als Ausreißer betrachtet werden
--> In unserem Service können Menschen Investitionen bis zu 100.000 US-Dollar erhalten
df.columns
Index(['funded_amount', 'loan_amount', 'activity', 'sector', 'use',
'country_code', 'country', 'currency', 'term_in_months', 'lender_count',
'borrower_genders', 'repayment_interval', 'has_description',
'num_of_borrower', 'num_male', 'num_female', 'difference_usd',
'difference_pct', 'success_type', 'Continent_Name',
'Three_Letter_Country_Code', 'gdp', 'percPoverty', 'pop', 'loan_cat',
'project_size', 'is_goal_achieved'],
dtype='object')
# fig = px.scatter(df, y= 'loan_amount', x = 'percPoverty')
# fig.show()
#Зависимость индекса бедности и размера кредита
df_proverty = df.loc[df['percPoverty'].notnull(), :].copy()
df_proverty.shape
(670695, 27)
df_proverty['percPoverty'].describe()
count 670695.000000 mean 26.435173 std 11.894269 min 0.600000 25% 16.700001 50% 22.799999 75% 36.099998 max 76.400002 Name: percPoverty, dtype: float64
cats = ['0%-20%','20%-30%','30%-50%', '50%-100%']
# cats = ['0%-10%','10%-30%','30%-45%','45%-60%', '0%-100%']
def_bins =[0,20, 30,50,100]
# def_bins =[0,10,30,45,60, 100]
df_proverty['poverty'] = pd.cut(df_proverty['percPoverty'], bins=def_bins, labels=cats)
# sektor_gr_df = df_proverty.groupby(['sector'], as_index=False, observed=True).size()
poverty_gr_df = df_proverty.groupby(['poverty'], as_index=False, observed=True).size()
sector_vs_proverty_df = df_proverty.groupby(['sector', 'poverty'], as_index=False, observed=True).agg(
#gdp_mean = ('gdp', 'mean'),
funded_amount_mean = ('funded_amount', 'mean'),
GKV = ('funded_amount', 'sum'),
project_num = ('loan_amount', 'size')
).merge(poverty_gr_df, left_on='poverty', right_on='poverty').sort_values('size', ascending=False)
sector_vs_proverty_df.head(10)
| sector | poverty | funded_amount_mean | GKV | project_num | size | |
|---|---|---|---|---|---|---|
| 0 | Food | 0%-20% | 550.369934 | 27023716.0 | 49101 | 236876 |
| 8 | Clothing | 0%-20% | 934.724976 | 4418445.0 | 4727 | 236876 |
| 1 | Transportation | 0%-20% | 582.443542 | 3197615.0 | 5490 | 236876 |
| 14 | Entertainment | 0%-20% | 1528.186646 | 573070.0 | 375 | 236876 |
| 13 | Housing | 0%-20% | 561.755676 | 8848775.0 | 15752 | 236876 |
| 12 | Personal Use | 0%-20% | 292.734131 | 6544950.0 | 22358 | 236876 |
| 10 | Health | 0%-20% | 1054.052979 | 1313350.0 | 1246 | 236876 |
| 9 | Construction | 0%-20% | 912.322205 | 1282725.0 | 1406 | 236876 |
| 11 | Education | 0%-20% | 1001.551208 | 7389445.0 | 7378 | 236876 |
| 7 | Retail | 0%-20% | 446.418793 | 26296744.0 | 58906 | 236876 |
sector_vs_proverty_df['sector_prc_prj'] = sector_vs_proverty_df['project_num'] / sector_vs_proverty_df['size'] * 100
sector_vs_proverty_df.sort_values(['sector', 'poverty'], inplace=True)
fig = px.line(sector_vs_proverty_df, y= 'sector_prc_prj', x = 'sector', color='poverty', markers=True ,
color_discrete_map={
'0%-20%':'#3d5a80',
'20%-30%':'#98c1d9',
'30%-50%':'#ee6c4d',
'50%-100%':'#7F3D49',
})
# fig.update_xaxes(categoryorder = "total descending")
fig.update_traces(line=dict(width=1.75))
fig.show()
--- Für Länder mit einem hohen Armutsindex (50%-100%):
- 35% aller Projekte sind im Segment 'Food'
- 24.4% - im Segment 'Retail'
- 11.2% - 'Clothing'
- 10.5% - 'Agriculture'
credit_size_vs_proverty_df = df_proverty.groupby(['loan_cat', 'poverty'], as_index=False, observed=True).agg(
#gdp_mean = ('gdp', 'mean'),
funded_amount_mean = ('funded_amount', 'mean'),
funded_amount_sum = ('funded_amount', 'sum'),
project_num = ('loan_amount', 'size')
)
credit_size_vs_proverty_df
| loan_cat | poverty | funded_amount_mean | funded_amount_sum | project_num | |
|---|---|---|---|---|---|
| 0 | < 250 USD | 0%-20% | 182.998184 | 16407800.0 | 89661 |
| 1 | < 250 USD | 20%-30% | 194.321259 | 3737575.0 | 19234 |
| 2 | < 250 USD | 30%-50% | 167.248062 | 7224280.0 | 43195 |
| 3 | < 250 USD | 50%-100% | 163.629608 | 1817925.0 | 11110 |
| 4 | 250-500 USD | 0%-20% | 364.319427 | 26872564.0 | 73761 |
| 5 | 250-500 USD | 20%-30% | 385.306244 | 25741924.0 | 66809 |
| 6 | 250-500 USD | 30%-50% | 368.613586 | 17835000.0 | 48384 |
| 7 | 250-500 USD | 50%-100% | 351.468384 | 2585050.0 | 7355 |
| 8 | 500-1,000 USD | 0%-20% | 684.447998 | 28599660.0 | 41785 |
| 9 | 500-1,000 USD | 20%-30% | 716.494202 | 52005300.0 | 72583 |
| 10 | 500-1,000 USD | 30%-50% | 677.145142 | 27470424.0 | 40568 |
| 11 | 500-1,000 USD | 50%-100% | 715.563171 | 4955275.0 | 6925 |
| 12 | 1,000-2,000 USD | 0%-20% | 1277.587036 | 24919336.0 | 19505 |
| 13 | 1,000-2,000 USD | 20%-30% | 1289.952271 | 64932324.0 | 50337 |
| 14 | 1,000-2,000 USD | 30%-50% | 1236.950073 | 25893076.0 | 20933 |
| 15 | 1,000-2,000 USD | 50%-100% | 1303.960571 | 7169175.0 | 5498 |
| 16 | 2,000-5,000 USD | 0%-20% | 2655.140381 | 26293856.0 | 9903 |
| 17 | 2,000-5,000 USD | 20%-30% | 2874.956055 | 57372624.0 | 19956 |
| 18 | 2,000-5,000 USD | 30%-50% | 3111.080078 | 30127700.0 | 9684 |
| 19 | 2,000-5,000 USD | 50%-100% | 3194.875244 | 12686850.0 | 3971 |
| 20 | >5,000 USD | 0%-20% | 6739.429688 | 15237850.0 | 2261 |
| 21 | >5,000 USD | 20%-30% | 6151.070312 | 19111376.0 | 3107 |
| 22 | >5,000 USD | 30%-50% | 6372.585449 | 18079024.0 | 2837 |
| 23 | >5,000 USD | 50%-100% | 6945.254883 | 9258025.0 | 1333 |
fig = px.bar(credit_size_vs_proverty_df, y= 'funded_amount_sum', x = 'poverty', color='loan_cat',
labels = {'funded_amount_sum' : 'Gesamtkreditbetrag', 'poverty' : 'Armutindex', 'loan_cat' : 'Kreditgröße'},
color_discrete_sequence= px.colors.sequential.matter, title='Gesamtkreditbetrag nach Armutsindex')
fig.show()
fig = make_subplots(
rows=1, cols=2,
specs=[[{'type':'bar'},{'type':'bar'}]],
horizontal_spacing = 0.09,
subplot_titles=('Gesamtkreditbetrag nach Armutsindex',
"Anzahl Kredite nach Armutsindex"))
fig1 = px.bar(credit_size_vs_proverty_df, y= 'poverty', x = 'funded_amount_sum', color='loan_cat', orientation='h',
labels = {'funded_amount_sum' : 'Gesamtkreditbetrag', 'poverty' : 'Armutindex', 'loan_cat' : 'Kreditgröße'},
color_discrete_sequence= px.colors.sequential.matter)
fig2 = px.bar(credit_size_vs_proverty_df, y= 'poverty', x = 'project_num', color='loan_cat', orientation='h',
labels = {'project_num' : 'Gesamtkreditbetrag', 'poverty' : 'Armutindex', 'loan_cat' : 'Kreditgröße'},
color_discrete_sequence= px.colors.sequential.matter)
fig.add_trace(fig1.data[0],row=1, col=1)
fig.add_trace(fig1.data[1],row=1, col=1)
fig.add_trace(fig1.data[2],row=1, col=1)
fig.add_trace(fig1.data[3],row=1, col=1)
fig.add_trace(fig1.data[4],row=1, col=1)
fig.add_trace(fig1.data[5],row=1, col=1)
fig.add_trace(fig2.data[0],row=1, col=2)
fig.add_trace(fig2.data[1],row=1, col=2)
fig.add_trace(fig2.data[2],row=1, col=2)
fig.add_trace(fig2.data[3],row=1, col=2)
fig.add_trace(fig2.data[4],row=1, col=2)
fig.add_trace(fig2.data[5],row=1, col=2)
fig.update_traces(showlegend=False, row=1,col=2)
fig.update_yaxes(showticklabels=False, row=1,col=1)
fig.update_yaxes(title_text = "Armutindex", row=1,col=1)
fig.update_xaxes(title_text = "Gesamtkreditbetrag, USD", row=1,col=1, autorange="reversed")
fig.update_xaxes(title_text = "Anzahl Projekte", row=1,col=2)
fig.update_layout(legend=dict(
orientation="h",
yanchor="bottom", y=-0.3,
xanchor="left", x=0),
legend_title_text= 'Kreditgröße', height=600)
fig.show()
In Ländern mit Armutindex 0%-20%
Die höchste Anzahl von Projekten unter 250 USD im Vergleich zu Ländern mit einem höheren Armutsindex.
---> Vielleicht spiegelt der Armutsindex nicht das finanzielle Wohlergehen der Bevölkerung eines Landes wider. Es wäre zu erwarten, dass in Ländern mit niedrigen Armutsindex weniger Mikrokredite vergeben werden.
df.gdp.describe()
count 671205.000000 mean 3452.065186 std 6013.789062 min 238.990005 25% 1543.670044 50% 3298.830078 75% 3798.639893 max 63413.511719 Name: gdp, dtype: float64
cats = ['<1,5K USD','1,5-2,5K USD', '2,5-5K USD', '> 5K USD']
def_bins =[0, 1500,2500,5000, 65000]
df['gdp_cat'] = pd.cut(df_proverty['gdp'], bins=def_bins, labels=cats)
credit_size_vs_gdp_df = df.groupby(['loan_cat', 'gdp_cat'], as_index=False, observed=True).agg(
#gdp_mean = ('gdp', 'mean'),
funded_amount_mean = ('funded_amount', 'mean'),
funded_amount_sum = ('funded_amount', 'sum'),
project_num = ('loan_amount', 'size')
)
# credit_size_vs_gdp_df
fig = make_subplots(
rows=1, cols=2,
specs=[[{'type':'bar'},{'type':'bar'}]],
horizontal_spacing = 0.11,
subplot_titles=('Gesamtkreditbetrag nach GDP',
"Anzahl Kredite nach GDP"))
fig1 = px.bar(credit_size_vs_gdp_df, y= 'gdp_cat', x = 'funded_amount_sum', color='loan_cat', orientation='h',
labels = {'funded_amount_sum' : 'Gesamtkreditbetrag', 'gdp_cat' : 'GDP', 'loan_cat' : 'Kreditgröße'},
color_discrete_sequence= px.colors.sequential.matter)
fig2 = px.bar(credit_size_vs_gdp_df, y= 'gdp_cat', x = 'project_num', color='loan_cat', orientation='h',
labels = {'project_num' : 'Gesamtkreditbetrag', 'gdp_cat' : 'GDP', 'loan_cat' : 'Kreditgröße'},
color_discrete_sequence= px.colors.sequential.matter)
fig.add_trace(fig1.data[0],row=1, col=1)
fig.add_trace(fig1.data[1],row=1, col=1)
fig.add_trace(fig1.data[2],row=1, col=1)
fig.add_trace(fig1.data[3],row=1, col=1)
fig.add_trace(fig1.data[4],row=1, col=1)
fig.add_trace(fig1.data[5],row=1, col=1)
fig.add_trace(fig2.data[0],row=1, col=2)
fig.add_trace(fig2.data[1],row=1, col=2)
fig.add_trace(fig2.data[2],row=1, col=2)
fig.add_trace(fig2.data[3],row=1, col=2)
fig.add_trace(fig2.data[4],row=1, col=2)
fig.add_trace(fig2.data[5],row=1, col=2)
fig.update_traces(showlegend=False, row=1,col=2)
fig.update_yaxes(showticklabels=False, row=1,col=1)
fig.update_yaxes(title_text = "GDP", row=1,col=1)
fig.update_xaxes(title_text = "Gesamtkreditbetrag, USD", row=1,col=1, autorange="reversed")
fig.update_xaxes(title_text = "Anzahl Projekte", row=1,col=2)
fig.update_layout(legend=dict(
orientation="h",
yanchor="bottom", y=-0.3,
xanchor="left", x=0),
legend_title_text= 'Kreditgröße', height=600)
fig.show()
# fig = make_subplots(
# rows=1, cols=2,
# specs=[[{'type':'barpolar'},{'type':'barpolar'}]],
# subplot_titles=('Gesamtkreditbetrag nach Armutsindex',
# "Anzahl Kredite nach Armutsindex"))
# fig1 = px.bar_polar(credit_size_vs_proverty_df, r="funded_amount_sum", theta="poverty",
# color="loan_cat", color_discrete_sequence= px.colors.sequential.matter)
# fig2 = px.bar_polar(credit_size_vs_proverty_df, r="project_num", theta="poverty",
# color="loan_cat", color_discrete_sequence= px.colors.sequential.matter)
# fig.add_trace(fig1.data[0],row=1, col=1)
# fig.add_trace(fig1.data[1],row=1, col=1)
# fig.add_trace(fig1.data[2],row=1, col=1)
# fig.add_trace(fig1.data[3],row=1, col=1)
# fig.add_trace(fig1.data[4],row=1, col=1)
# fig.add_trace(fig1.data[5],row=1, col=1)
# fig.add_trace(fig2.data[0],row=1, col=2)
# fig.add_trace(fig2.data[1],row=1, col=2)
# fig.add_trace(fig2.data[2],row=1, col=2)
# fig.add_trace(fig2.data[3],row=1, col=2)
# fig.add_trace(fig2.data[4],row=1, col=2)
# fig.add_trace(fig2.data[5],row=1, col=2)
# fig.update_traces(showlegend=False, row=1,col=1)
# fig.update_layout(legend_title_text= 'Kreditgröße')
# fig.update_polars(angularaxis_rotation=45)
# fig.update_polars(angularaxis_direction= 'clockwise')
# # fig.update_layout(legend=dict(
# # orientation="h",
# # yanchor="top", y=0,
# # xanchor="left", x=0))
# fig.show()
gdp_gr_df = df.groupby(['gdp_cat'], as_index=False, observed=True).size()
sector_vs_gdp_df = df.groupby(['sector', 'gdp_cat'], as_index=False, observed=True).agg(
#gdp_mean = ('gdp', 'mean'),
funded_amount_mean = ('funded_amount', 'mean'),
GKV = ('funded_amount', 'sum'),
project_num = ('loan_amount', 'size')
).merge(gdp_gr_df, left_on='gdp_cat', right_on='gdp_cat').sort_values('size', ascending=False)
sector_vs_gdp_df.head(10)
| sector | gdp_cat | funded_amount_mean | GKV | project_num | size | |
|---|---|---|---|---|---|---|
| 30 | Food | 2,5-5K USD | 589.589294 | 37645276.0 | 63850 | 282822 |
| 38 | Clothing | 2,5-5K USD | 1007.794189 | 9206200.0 | 9135 | 282822 |
| 31 | Transportation | 2,5-5K USD | 639.385132 | 4294750.0 | 6717 | 282822 |
| 32 | Arts | 2,5-5K USD | 992.281616 | 5123150.0 | 5163 | 282822 |
| 33 | Services | 2,5-5K USD | 990.285522 | 13473825.0 | 13606 | 282822 |
| 34 | Agriculture | 2,5-5K USD | 657.759399 | 45209776.0 | 68733 | 282822 |
| 36 | Wholesale | 2,5-5K USD | 1280.603394 | 222825.0 | 174 | 282822 |
| 37 | Retail | 2,5-5K USD | 502.159088 | 33671776.0 | 67054 | 282822 |
| 35 | Manufacturing | 2,5-5K USD | 819.133728 | 2321425.0 | 2834 | 282822 |
| 39 | Construction | 2,5-5K USD | 981.422546 | 2204275.0 | 2246 | 282822 |
sector_vs_gdp_df['sector_prc_prj'] = sector_vs_gdp_df['project_num'] / sector_vs_gdp_df['size'] * 100
sector_vs_gdp_df.sort_values(['sector', 'gdp_cat'], inplace=True)
fig = px.line(sector_vs_gdp_df, y= 'sector_prc_prj', x = 'sector', color='gdp_cat', markers=True ,
color_discrete_map={
'<1,5K USD':'#3d5a80',
'1,5-2,5K USD':'#98c1d9',
'2,5-5K USD':'#ee6c4d',
'> 5K USD':'#7F3D49',
})
# fig.update_xaxes(categoryorder = "total descending")
fig.update_traces(line=dict(width=1.75))
fig.show()
df_proverty.columns
Index(['funded_amount', 'loan_amount', 'activity', 'sector', 'use',
'country_code', 'country', 'currency', 'term_in_months', 'lender_count',
'borrower_genders', 'repayment_interval', 'has_description',
'num_of_borrower', 'num_male', 'num_female', 'difference_usd',
'difference_pct', 'success_type', 'Continent_Name',
'Three_Letter_Country_Code', 'gdp', 'percPoverty', 'pop', 'loan_cat',
'project_size', 'is_goal_achieved', 'poverty'],
dtype='object')
pop_params_df = df_proverty.groupby(
['Three_Letter_Country_Code', 'pop', 'country', 'percPoverty', 'Continent_Name'],
as_index=False, observed=True).agg(funded_amount_sum = ('funded_amount', 'sum'), project_num = ('loan_amount', 'size'))
pop_params_df['funded_amount_pro_person'] = pop_params_df['funded_amount_sum'] / pop_params_df['pop']
pop_params_df.sort_values('funded_amount_pro_person', ascending=False)
| Three_Letter_Country_Code | pop | country | percPoverty | Continent_Name | funded_amount_sum | project_num | funded_amount_pro_person | |
|---|---|---|---|---|---|---|---|---|
| 51 | WSM | 1.984100e+05 | Samoa | 20.299999 | Oceania | 5641825.0 | 7396 | 28.435185 |
| 49 | PRY | 7.132530e+06 | Paraguay | 23.500000 | South America | 29412700.0 | 11903 | 4.123740 |
| 48 | ARM | 2.963234e+06 | Armenia | 26.400000 | Europe | 11186675.0 | 8631 | 3.775157 |
| 4 | SLV | 6.486201e+06 | El Salvador | 22.799999 | North America | 23357724.0 | 39875 | 3.601141 |
| 14 | PSE | 4.803269e+06 | Palestine | 29.200001 | Asia | 12032025.0 | 8167 | 2.504966 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 63 | MRT | 4.649660e+06 | Mauritania | 31.000000 | Africa | 15000.0 | 1 | 0.003226 |
| 45 | BRA | 2.125594e+08 | Brazil | 26.500000 | South America | 661025.0 | 284 | 0.003110 |
| 78 | CIV | 2.637828e+07 | Cote D'Ivoire | 39.500000 | Africa | 50000.0 | 1 | 0.001895 |
| 74 | AFG | 3.892834e+07 | Afghanistan | 54.500000 | Asia | 14000.0 | 2 | 0.000360 |
| 72 | CHN | 1.410929e+09 | China | 0.600000 | Asia | 373475.0 | 134 | 0.000265 |
80 rows × 8 columns
pop_params_df['funded_amount_pro_person'].describe()
count 80.000000 mean 0.856095 std 3.231297 min 0.000265 25% 0.040818 50% 0.163661 75% 0.611959 max 28.435185 Name: funded_amount_pro_person, dtype: float64
pop_params_df['funded_amount_pro_person'].sort_values(ascending=False)
51 28.435185
49 4.123740
48 3.775157
4 3.601141
14 2.504966
...
63 0.003226
45 0.003110
78 0.001895
74 0.000360
72 0.000265
Name: funded_amount_pro_person, Length: 80, dtype: float64
pop_params_df.loc[51, :]
Three_Letter_Country_Code WSM pop 198410.0 country Samoa percPoverty 20.299999 Continent_Name Oceania funded_amount_sum 5641825.0 project_num 7396 funded_amount_pro_person 28.435185 Name: 51, dtype: object
Samoa - Ausreißer mit 28.43 USD pro Kopf, löschen für eine mehr übersichtliche Darstellung.
for_visual = pop_params_df.drop(labels=[51], axis=0)
fig = px.scatter(for_visual, x='percPoverty', y='funded_amount_pro_person', hover_name="country", size='project_num',
size_max=50, color='Continent_Name',
labels={'funded_amount_pro_person':'Kreditbetrag pro Person, USD',
'percPoverty' : 'Armutsindex',
'Continent_Name' : 'Kontinent'})
fig.show()
fig = px.choropleth(for_visual, locations="Three_Letter_Country_Code",
color="funded_amount_pro_person", # lifeExp is a column of gapminder
hover_name="country", # column to add to hover information
color_continuous_scale=px.colors.sequential.matter,
labels={'funded_amount_pro_person':'Kreditbetrag<br>pro Person, USD'})
fig.update_layout(title="Pro-Kopf-Kredit nach Ländern")
fig.show()
df.columns
Index(['funded_amount', 'loan_amount', 'activity', 'sector', 'use',
'country_code', 'country', 'currency', 'term_in_months', 'lender_count',
'borrower_genders', 'repayment_interval', 'has_description',
'num_of_borrower', 'num_male', 'num_female', 'difference_usd',
'difference_pct', 'success_type', 'Continent_Name',
'Three_Letter_Country_Code', 'gdp', 'percPoverty', 'pop', 'loan_cat',
'project_size', 'is_goal_achieved', 'gdp_cat'],
dtype='object')
for_visual = df.groupby('num_of_borrower', as_index=False).size().sort_values('num_of_borrower')
fig = px.scatter(for_visual, x='num_of_borrower', y='size',
log_y=True,
color_discrete_sequence=["#3d5a80"],
labels={'size': 'log. Anzahl Projekte', 'num_of_borrower':'Anzahl Kreditnehmer'})
fig.show()
round(for_visual.loc[for_visual.num_of_borrower == 1, 'size'] / df.shape[0]* 100, 2)
1 83.61 Name: size, dtype: float64
round(for_visual.loc[for_visual.num_of_borrower == 0, 'size'] / df.shape[0]* 100, 2)
0 0.63 Name: size, dtype: float64
--- Der Großteil der Projekte (83.61 %) wird von einzelnen Kreditnehmern vergeben.
--- Der Rest sind Gruppen von Kreditnehmern mit bis zu 50 Personen.
--- Bei 0.63% Prozent der Projekte liegen keine Informationen über die Kreditnehmer vor.
def get_borrower_sex(num_male, num_female):
if (num_male == 0) & (num_female > 0):
return 'F'
elif (num_female == 0) & (num_male > 0):
return 'M'
elif (num_female > 0) & (num_male > 0):
return 'FM'
else:
return 'U'
df['borrower_type'] = df.loc[:, ['num_male','num_female']].apply(lambda x: get_borrower_sex(x['num_male'], x['num_female']), axis=1)
df.loc[:, ['num_male', 'num_female', 'borrower_type']].head(3)
| num_male | num_female | borrower_type | |
|---|---|---|---|
| 0 | 0 | 1 | F |
| 1 | 0 | 2 | F |
| 2 | 0 | 1 | F |
for_visual = df.groupby('borrower_type', as_index=False).size()
fig = px.pie(
for_visual,
values='size',
names='borrower_type',
color='borrower_type',
title='',
color_discrete_map={
'F':'#3d5a80',
'M':'#98c1d9 ',
'FM':'#ee6c4d',
'U':'#293241'
}
)
fig.update_traces(textposition='auto', textinfo='percent + label')
fig.show()
--- 72,4 % der Kreditanfragen wurdenn von Frauen oder einer Gruppe von Frauen gestellt. Auf der anderen Seite haben Männer oder Gruppen von Männern nur 20,6 % der Anfragen gestellt.
f = sum(df['num_female'])
m = sum(df['num_male'])
g = m + f
print('Anzahl Frauen =', f, ',', round(f / g * 100, 2), '%')
print('Anzahl Männer =', m, ',', round(m / g * 100, 2), '%')
print('Verhaltnis =', round(f / m, 3))
Anzahl Frauen = 1071308 , 79.58 % Anzahl Männer = 274904 , 20.42 % Verhaltnis = 3.897
--- Unabhängig von der Zusammensetzung der Gruppen sind 79.58% der Kreditnehmer Frauen und nur 20.42% Männer.
---> Das müssen wir bei unserem Marketing berücksichtigen.
df_gender = df.groupby("num_of_borrower", as_index=False).agg(
number_males=("num_male", "sum"),
number_females=("num_female", "sum")
)
df_gender["ratio"] = df_gender["number_males"] / df_gender["number_females"]
df_gender.drop([0], axis = 0, inplace=True)
df_gender.reset_index(inplace=True, drop=True)
df_gender.head()
| num_of_borrower | number_males | number_females | ratio | |
|---|---|---|---|---|
| 0 | 1 | 134710.0 | 426502.0 | 0.315848 |
| 1 | 2 | 4992.0 | 27432.0 | 0.181977 |
| 2 | 3 | 7198.0 | 41402.0 | 0.173856 |
| 3 | 4 | 8298.0 | 45890.0 | 0.180824 |
| 4 | 5 | 12863.0 | 54572.0 | 0.235707 |
max(df_gender.ratio)
1.0025839793281655
# Create figure with secondary y-axis
fig = make_subplots(specs=[[{"secondary_y": True}]])
# Add traces
fig.add_trace(
go.Scatter(x=df_gender['num_of_borrower'], y=df_gender['number_males'],
marker_color='#506280', mode="lines+markers",
name='Männer'),
secondary_y=False,
)
fig.add_trace(
go.Scatter(x=df_gender['num_of_borrower'], y=df_gender['number_females'],
marker_color='#F3927A', mode="lines+markers",
name='Frauen'),
secondary_y=False,
)
fig.add_trace(
go.Scatter(x=df_gender['num_of_borrower'], y=df_gender['ratio'],
name='Männer/Frauen Ratio', mode="lines+markers", opacity=0.4,
line=dict(color='#293241')),
secondary_y=True,
)
# Add figure title
fig.update_layout(
title_text="Anzahl der weiblichen und männlichen Kreditnehmer für jede Projektgröße (Anzahl der Kreditnehmer)"
)
fig.update_xaxes(title_text="Anzahl Kreditnehmer im Projekt", range=[0, 51])
fig.update_yaxes(title_text="log. Anzahl Kreditnehmer", type="log", secondary_y=False)
fig.update_yaxes(title_text="Männer / Frauen Ratio", secondary_y=True)#, range=[0, 1.1])
fig.update_layout(barmode='relative')
fig.show()
Das Verhältnis zwischen Männern und Frauen ist bei allen Projektgrößen (gemessen an der Zahl der Teilnehmer) weitgehend konstant und die Frauen dominieren zahlenmäßig völlig.
Dies zeigt uns, dass es für Frauen kein Problem ist, ihre eigenen Projekte ohne Männer zu starten, und sie die größte Kundengruppe sind, wenn wir die Kreditnehmer betrachten.
---> Dies kann dazu genutzt werden, unsere Marketingstrategie sowohl für Kreditnehmer als auch für Kreditgeber anzupassen. Wir unterstützen unabhängige Frauen bei der Gründung ihres eigenen Unternehmens.